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ABSTRACT 



Context. Propagation of charged cosmic-rays in the Galaxy depends on the transport parameters, whose number can 
be large depending on the propagation model under scrutiny. A standard approach for determining these parameters 
C*j ] is a manual scan, leading to an inefficient and incomplete coverage of the parameter space. 

O |. Aims. In analyzing the data from forthcoming experiments, a more sophisticated strategy is required. An automated 

statistical tool is used, which enables a full coverage of the parameter space and provides a sound determination of the 
O ' transport and source parameters. The uncertainties in these parameters are also derived. 

Methods. We implement a Markov Chain Monte Carlo (MCMC), which is well suited to multi-parameter determination. 
Its specificities (burn-in length, acceptance, and correlation length) are discussed in the context of cosmic-ray physics. 
Its capabilities and performances are explored in the phenomenologically well-understood Leaky-Box Model. 
Results. From a technical point of view, a trial function based on binary-space partitioning is found to be extremely 
\ efficient, allowing a simultaneous determination of up to nine parameters, including transport and source parameters, 

such as slope and abundances. Our best-fit model includes both a low energy cut-off and reacceleration, whose values 
are consistent with those found in diffusion models. A Kolmogorov spectrum for the diffusion slope (8 = 1/3) is ex- 
cluded. The marginalised probability-density function for 5 and a (the slope of the source spectra) are 5 w 0.55 — 0.60 
and a ~ 2.14 — 2.17, depending on the dataset used and the number of free parameters in the fit. All source-spectrum 
, parameters (slope and abundances) are positively correlated among themselves and with the reacceleration strength, 

but are negatively correlated with the other propagation parameters. 

Conclusions. The MCMC is a practical and powerful tool for cosmic-ray physic analyses. It can be used to confirm hy- 
potheses concerning source spectra (e.g., whether a; 7^ ay) and/or determine whether different datasets are compatible. 
OO , A forthcoming study will extend our analysis to more physical diffusion models. 
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5_| ' 1. Introduction sented the B/C ratio at 0.5-50 GeV/n (Panov et al. 2007), 
d 1 and for H to Fe fluxes at 100 GeV-100 TeV (Panov et al. 
One issue of cosmic-ray (CR) physics is the determination 2 006). At higher energy, two long-duration balloon flights 
of the transport parameters in the Galaxy. This determi- will soon prov i de spe ctra for Z=l-30 nuclei. The TRACER 
nation is based on the analysis of the secondary-to-primary collaboration has published spectra for oxygen up to iron 
ratio (e.g., B/C, sub-Fe/Fe), for which the dependence on in thc QcV/n-TcV/n range (Boyle et al. 2007; Ave et al. 
the source spectra is negligible, and the ratio remains in- 2 008). A second long-duration flight took place in summer 
stead mainly sensitive to the propagation processes (e.g., 2 006, during which the instrument was designed to have a 
Maurin et al. 2001 and references therein). For almost 20 wider dynamic-range capability and to measure lighter B, 
years, the determination of these parameters relied mostly c and N e i emeri ts. Thc CREAM experiment (Seo et al. 
on the most constraining data, namely the HEAO-3 data, 2 004) flew a cumulative duration of 70 days in December 
taken in 1979, which covered the ~ 1 - 35 GeV/n range 2004 and December 2005 (Seo et al. 2006, and preliminary 
(Engelmann et al. 1990). results in Marrocchesi et al. 2006 and Wakely et al. 2006), 
For thc first time since HEAO-3, several satellite or an d again in December 2007. A fourth flight was sched- 
balloon-borne experiments (see ICRC 2007 reporter's talk u led for December 2008 1 . Exciting data will arrive from thc 
Blasi 2008) have acquired higher quality data in the same PAMELA satellite (Picozza et al. 2007), which was success- 
energy range or covered a scarcely explored range (in terms f u ll y launched in June 2006 (Casolino et al. 2008). 
of energy, 1 TeV/n-PcV/n, or in terms of nucleus): from Wit h this wealth of new data, it is relevant to qucs- 
thc balloon-borne side, the ATIC collaboration has pre- tion t \ ic me thod used to extract the propagation parame- 
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ters. The value of these parameters is important to many 
theoretical and astrophysical questions, because they are 
linked, amongst others, to the transport in turbulent mag- 
netic fields, sources of CRs, and 7-ray diffuse emission (see 
Strong et al. 2007 for a recent review and references). It 
also proves to be crucial for indirect dark-matter detection 
studies (e.g., Donato et al. 2004, and Dclahaye et al. 2008). 
The usage in the past has been based mostly on a manual 
or semi-automated — hence partial — coverage of the param- 
eter space (e.g., Webber et al. 1992, Strong & Moskalcnko 
1998, and Jones et al. 2001). More complete scans were per- 
formed in Maurin et al. (2001, 2002), and Lionetto et al. 
(2005), although in an inefficient manner: the addition of 
a single new free parameter (as completed for example in 
Maurin et al. 2002 compared to Maurin et al. 2001) re- 
mains prohibitive in terms of computing time. To remedy 
these shortcomings, we propose to use the Markov Chain 
Monte Carlo (MCMC) algorithm, which is widely used in 
cosmological parameter estimates (e.g., Christensen et al. 
2001, Lewis & Bridle 2002, and Dunkley et al. 2005). One 
goal of the paper is to confirm whether the MCMC algo- 
rithm can provide similar benefits in CR physics. 

The analysis is performed in the framework of the 
Leaky-Box Model (LBM), a simple and widely used prop- 
agation model. This model contains most of the CR phe- 
nomenology and is well adapted to a first implementation 
of the MCMC tool. In Sect. 2, we highlight the appropri- 
ateness of the MCMC compared to other algorithms used 
in the field. In Sect. 3, the MCMC algorithm is presented. 
In Sect. 4, this algorithm is implemented in the LBM. In 
Sect. 5, we discuss the MCMC advantages and effectiveness 
in the field of CR physics, and present results for the LBM. 
We present our conclusions in Sect. 6. Application of the 
MCMC technique to a more up-to-date modelling, such as 
diffusion models, is left to a forthcoming paper. 

2. Link between the MCMC, the CR data, and the 
model parameters 

Various models describe the propagation of CRs in the in- 
terstellar medium (Webber et al. 1992; Bloemen et al. 1993; 
Strong & Moskalenko 1998; Maurin et al. 2001; Berezhko 
et al. 2003; Shibata et al. 2006; Evoli et al. 2008). Each 
model is based on his own specific geometry and has its 
own set of parameters, characterising the Galaxy proper- 
ties. The MCMC approach aims to study quantitatively 
how the existing (or future) CR measurements can con- 
strain these models or, equivalently, how in such models 
the set of parameters (and their uncertainties) can be in- 
ferred from the data. 

In practice, a given set of parameters in a propagation 
model implies, e.g., a given B/C ratio. The model param- 
eters are constrained such as to reproduce the measured 
ratio. The standard practice used to be an eye inspection 
of the goodness of fit to the data. This was replaced by the 
X 2 analysis in recent papers: assuming the x 2 statistics is 
applicable to the problem at stake, confidence intervals in 
these parameters can be extracted (see App. A). 

The main drawback of this approach is the computing 
time required to extend the calculation of the x 2 surface 
to a wider parameter space. This is known as the curse of 
dimensionality, due to the exponential increase in volume 
associated with adding extra dimensions to the parame- 
ter space, while the good regions of this space (for instance 



where the model fits the data) only fill a tiny volume. This is 
where the MCMC approach, based on the Bayesian statis- 
tics, is superior to a grid approach. As in the grid approach, 
one end-product of the analysis is the \ 2 surface, but with a 
more efficient sampling of the region of interest. Moreover, 
as opposed to classical statistics, which is based on the con- 
struction of estimators of the parameters, Bayesian statis- 
tics assumes the unknown parameters to be random vari- 
ables. As such, their full distribution — the so-called condi- 
tional probability-density function (PDF) — given some ex- 
perimental data (and some prior density for these parame- 
ters, see below) can be generated. 

To summarise, the MCMC algorithm provides the PDF 
of the model parameters, based on selected experimental 
data (e.g., B/C). The mean value and uncertainty in these 
parameters are by-products of the PDF. The MCMC en- 
ables the enlargement of the parameter space at a minimal 
computing time cost (although the MCMC and Metropolis- 
Hastings algorithms used here are not the most efficient 
one) . The technicalities of the MCMC are briefly described 
below. The reader is referred to Neal (1993) and MacKay 
(2003) for a more substantial coverage of the subject. 

3. Markov Chain Monte Carlo (MCMC) 

Considering a model depending on m parameters 



(1) 



we aim to determine the conditional PDF of the parameters 
given the data, P(0|data). This so-called posterior proba- 
bility quantifies the change in the degree of belief one can 
have in the m parameters of the model in the light of the 
data. Applied to the parameter inference, Bayes theorem is 



P(0|data) 



P(data|0) • P(0) 
P(data) ' 



(2) 



where P(data) is the data probability (the latter does not 
depend on the parameters and hence, can be considered to 
be a normalisation factor). This theorem links the posterior 
probability to the likelihood of the data C(9) = P(data|0) 
and the so-called prior probability P(9), indicating the de- 
gree of belief one has before observing the data. To extract 
information about a single parameter, 9^ a \ the posterior 
density is integrated over all other parameters #( fe ^ Q ' in 
a procedure called marginalisation. Finally, by integrating 
the individual posterior PDF further, we are able to deter- 
mine the expectation value, confidence level, or higher or- 
der mode of the parameter 0^ a > . This illustrates the techni- 
cal difficulty of Bayesian parameter estimates: determining 
the individual posterior PDF requires a high-dimensional 
integration of the overall posterior density. Thus, an effi- 
cient sampling method for the posterior PDF is manda- 
tory. For models of more than a few parameters, regular 
grid-sampling approaches are not applicable and statistical 
techniques are required (Cowan 1997). 

Among these techniques, MCMC algorithms have been 
fully tried and tested for Bayesian parameter inference 
(MacKay 2003; Neal 1993). MCMC methods explore any 
target distribution given by a vector of parameters p{9) , by 
generating a sequence of n points (hereafter a chain) 



{01, 02, • . • , On}. 



(3) 
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Each Oi is a vector of m components [as defined in Eq. (1)]. 
In addition, the chain is Markovian in the sense that the 
distribution of 6 n +i is influenced entirely by the value of 
n . MCMC algorithms are developed so that the time spent 
by the Markov chain in a region of the parameter space is 
proportional to the target PDF value in this region. Hence, 
from such a chain, one can obtain an independent sampling 
of the PDF. The target PDF as well as all marginalised 
PDF are estimated by counting the number of samples 
within the related region of parameter space. 

Below, we provide a brief introduction to an MCMC 
using the Metropolis-Hastings algorithm (see Neal 1993 and 
MacKay 2003, chapter 29 for further details and references). 

3.1. The algorithm 

The prescription that we use to generate the Markov 
chains from the unknown target distribution is the so- 
called Metropolis-Hastings algorithm. The Markov chain 
increases by jumping from the current point in the pa- 
rameter space Oi to the following 0,+i. As said before, the 
PDF of the new point only depends on the current point, 
i.e. T(0 i+ i|0i,...,0i) =T(0 i+ i|0j). This quantity defines 
the transition probability for state 0i+i from the state 0;. 
The Metropolis-Hastings algorithm specifies T to ensure 
that the stationary distribution of the chain asymptotically 
tends to the target PDF one wishes to sample from. 

At each step i (corresponding to a state 0j), a trial state 
0triai is generated from a proposal density <7(0triai|0i)- This 
proposal density is chosen so that samples can be easily gen- 
erated (e.g., a Gaussian distribution centred on the current 
state). The state ©trial is accepted or rejected depending on 
the following criterion. By forming the quantity 



a(0triai|0;) = min 1 



P(0trial) 9(0i|0tr 



ial t 



(4) 



p(6i) 9(0 t rial|0 l 

the trial state is accepted as a new state with a probability a 
(rejected with probability 1 — a). The transition probability 
is then 

T{e i+1 \9i) = a(0 tri al|0i)«(0trial|0i). (5) 

If accepted, 0,_|_i = ©trial ; whereas if rejected, the new state 
is equivalent to the current state, 0i+i = 0^. This criterion 
ensures that once at its equilibrium, the chain samples the 
target distribution p(9). If the proposal density g (©trial |0j) 
is chosen to be symmetric, it cancels out in the expression 
of the acceptance probability, which becomes: 



a = min 1 



p{0 



trial j 



P(0i) 



(6) 



We note that the process requires only evaluations of 
ratios of the target PDF. This is a major virtue of this al- 
gorithm, in particular for Bayesian applications, in which 
the normalisation factor in Eq. (2), P(data) = J P(data|0)- 
P(0)d0 is often extremely difficult to compute. Hence, the 
ratio of the target PDF, i.e. the posterior of the param- 
eter for our problem, can be calculated directly from the 
likelihood of the data and the priors. 

3.2. Chain analysis 

The chain analysis refers to the study of several proper- 
ties of the chains. The following quantities are inspected in 
order to convert the chains in PDFs. 



Burn-in length The burn-in describes the practice of remov- 
ing some iterations at the beginning of the chain to elim- 
inate the transient time needed to reach the equilibrium 
or stationary distribution, i.e., to forget the starting point. 
The burn-in length b is defined to be the number of first 
samples {0i}j=i,...,6 of the chain that must be discarded. 
The stationary distribution is reached when the chain en- 
ters the most probable parameter region corresponding to 
the region where the target function is close to its maximal 
value. To estimate b, the following criterion is used: we de- 
fine Pi/2 to be the median of the target function distribution 
obtained from the entire chain of N samples. The burn- 
in length b corresponds to the first sample ©&, for which 
p(0b)>Pi/2 (sec App. C for an illustration). 

Correlation length By construction [see Eq. (5)], each step 
of the chain depends on the previous one, which ensures 
that the steps of the chain are correlated. We can obtain 
independent samples by thinning the chain, i.e. by select- 
ing only a fraction of the steps with a periodicity chosen 
to derive uncorrelated samples. This period is estimated by 
computing the autocorrelation functions for each parame- 
ter. For a parameter 6^ a > (a = 1, . . . , to), the autocorrela- 
tion function is given by 
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which we calculate with the Fast Fourier Transformation 
(FFT). The correlation length for the a-th parameter 

is defined as the smallest j for which cj- < 1/2, i.e. the 

values £>; Q ' ) and Qyy- of the chain that are considered to be 
uncorrelated. The correlation length I for the chain, for all 
parameters, is defined to be 



1= max l {a) 

a—l,...,m 



(«) 



which is used as the period of the thinning (see App. C for 
an illustration). 



Independent samples and acceptance The independent sam- 
ples of the chain are chosen to be {0i}i=fj+zfc, where k is an 
integer. The number of independent samples Ai n d is defined 
to be the fraction of steps remaining after discarding the 
burn-in steps and thinning the chain, 



(9) 



The independent acceptance /i, lc i is the ratio of the number 
of independent samples A^ n( j to the total step number iVtoti 



/i 



ind 



iVt, 



(10) 



3.3. Choice of the target and trial functions 
3.3.1. Target function 

As already said, we wish to sample the target func- 
tion p(0) = P(0|data). Using Eq. (2) and the fact that 
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the algorithm is insensitive to the normalisation factor, 
this amounts to sampling the product P(data|A) ■ P(9). 
Assuming a flat prior P{9) = est, the target distribution 
reduces to 

p(8) = P(data|A) = £(A), (11) 
and here, the likelihood function is taken to be 

x 2 (oy 



C{9) = cxp 



The X (A) function for ridata data is 



X 2 (9) 



'"'data / ' 

k=l 



cxp 
k 



yt co {0)) 2 



(12) 



(13) 



where j/^ xp is the measured value, y^. hco is the hypothesised 
value for both a certain model and the parameters A, and 
(7k is the known variance of the measurement. For example, 
y^ xp and 2/ji heo represent the measured and calculated B/C 
ratios. 

The link between the target function, i.e., the posterior 
PDF of the parameters, and the experimental data is estab- 
lished with the help of Eqs (11) to (13). This link guarantees 
the proper sampling of the parameter space using Markov 
chains, which spend more time in more relevant regions of 
parameter space, as described above. 

3.3.2. Trial function 

Despite the effectiveness of the Metropolis-Hastings algo- 
rithm, to optimise the efficiency of the MCMC and min- 
imise the number of chains to be processed, trial functions 
should be as close as possible to the true distributions. We 
use a sequence of three trial functions to explore the param- 
eter space. The first step is a coarse determination of the 
parameter PDF. This allows us to calculate the covariancc 
matrix leading to a better coverage of parameter space, pro- 
vided that the target PDF is sufficiently close to being an 
N-dimensional Gaussian. The last step takes advantage of 
a binary-space partitioning (BSP) algorithm. 

Gaussian step For the first iteration, the proposal density 
Q (9 tried, 9i), required to obtain the trial value Atrial from 9i 
is written as 



q (Atrial, 9{) oc exp -- 



1 Atrial 



(«) z)(a) 



(14) 



These represent m independent Gaussian distributions cen- 
tred on 9i. The distribution is symmetric, so that the ac- 
ceptance probability a follows Eq. (6). The variance tr^ for 

each parameter a is to be specified. Each parameter #triai 
is hence calculated to be 



q(«) 
J trial 



where a; is a random number obeying a Gaussian distribu- 
tion centred on zero with unit variance. 

It is important to choose an optimal width a a to sample 
properly the posterior (target) distribution. If the width is 
too large, as soon as the chain reaches a region of high prob- 
ability, most of the trial parameters fall into a region of low 



probability and are rejected, leading to a low acceptance 
and a long correlation length. Conversely, for too small a 
width, the chain will take a longer time to reach the inter- 
esting regions. Eventually, even if the chain reaches these 
regions of high acceptance, only a partial coverage of the 
PDF support will be sampled (also leading to a long corre- 
lation length). 

In practice, we first define a a (a, = 1, . . . , m) equal to 
the expected range of the parameter. In a subsequent it- 
eration, a a is set to be 2v2 ln2 w 2.3 times c^ alc , i.e. the 
FWHM of the PDF obtained with the first iteration. The 
result is actually insensitive to the numerical factor used. 



Covariance matrix The proposal density is taken to be an 
N-dimensional Gaussian of covariance matrix V 



[-^(Atrial - 9 l ) T V ^Atrial -Ai 



(15) 



The covariance matrix V is symmetric and diagonalisable 
(D is a diagonal matrix of eigenvalues and P represents the 
change in the coordinate matrix), 

V = P T DP, 

and where again Eq. (6) holds. The parameters Atrial ar c 
hence found to be 

9 tlial = 0i + P T Dx, 

where a; is a vector of m random numbers following a 
Gaussian distribution centred on zero and with unit vari- 
ance. 

The covariance matrix V is estimated, e.g., from a pre- 
vious iteration using the Gaussian step. The advantage of 
this trial function with respect to the previous one is that 
it takes account of the possible correlations between the m 
parameters of the model. 



Binary Space Partitioning (BSP) A third method was devel- 
oped to define a proposal density for which the results of 
the Gaussian step or the covariance matrix iterations are 
used to subdivide the parameter space into boxes, in each 
of which a given probability is affected. 

The partitioning of the parameter space can be organ- 
ised using a binary-tree data structure known as a binary- 
space partitioning tree (de Berg et al. 2000). The root node 
of the tree is the m— dimensional box corresponding to the 
entire parameter space. The binary-space partitioning is 
then performed by dividing each box recursively into two 
child boxes if the partitioning satisfies the following require- 
ment: a box is divided only if the number of independent 
samples contained in this box is higher than a certain num- 
ber (here we used a maximum of between 3% and 0.1% 
of the total number of independent samples). When a box 
has to be divided, the division is made along the longer 
side of the box (the box-side lengths arc defined relative to 
the root-box sides). For each end node (i.e. node without 
any children), a probability, defined as the fraction of the 
number of independent samples in the box to their total 
number, is assigned. For empty boxes, a minimum proba- 
bility is assigned and all the probabilities are renormaliscd 
so that the sum of all end- node probabilities equals 1. 

The proposal density g(Atriai) is then defined, in each 
end-node box, as a uniform function equal to the assigned 
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Set random model parameters 6q 

Propagate 

Calculate p (6q ) 



for i < step 



ii' Gaussian step 



n(a) _ n( 


») , 


"trial - "i 


+ &a ■ X 



if covariance matrix step 



\e^ = e i + P r D'x\ 



random Atrial 



Set model parameters Atrial 

Propagate 

Calculate a(Atrial|Ai) 



\f a.(Bu\ a \.Oi) > random e]0, 1] 



— Atrial 
p(0i+l) =p (Atrial) 

accept ++ 



T 



Fig. 1. Flow chart of the implemented MCMC algorithm: Oi is 
a vector [Eq. (1)] of the a = 1, ... ,m free parameters of the 
model, evaluated at each step i, and is the target function 

given by Eq. (11). See text for details. 



probability. The sampling of this proposal density is sim- 
ple and efficient: an end node is chosen with the assigned 
probability and the trial parameters are chosen uniformly 
in the corresponding box. In comparison to the other two 
proposal densities, this proposal density based on a BSP is 
asymmetric, because it is only dependent on the proposal 
state g(0triai)- Hence, Eq. (4) must be used. 



4. Implementation in the propagation model 

The MCMC with the three above methods are implemented 
in the USINE package 2 , which computes the propagation of 
Galactic CR nuclei and anti-nuclei for several propagation 
models (LBM, ID and 2D diffusion models). The reader is 
referred to Maurin et al. (2001) for a detailed description 
for the nuclear parameters (fragmentation and absorption 
cross-sections), energy losses (ionisation and Coulomb), and 
solar modulation (force-field) used. 

We briefly describe how the MCMC algorithm is im- 
plemented in the propagation part (Sect. 4.1), using a 
LBM — the procedure would be similar for any other 
model. The LBM and its parameters are briefly discussed 
(Sect. 4.2) as well as the input spectrum parameters 
(Sect. 4.3). Additional information about the data are gath- 
ered in App. B. 



4.1. Flow chart 

A flow chart of the Metropolis-Hastings MCMC algorithm 
used in the context of GCRs is given in Fig. 1. To sum- 



marise, the initial values of the propagation parameters 
9 are chosen randomly in their expected range to crank 
up each Markov chain. The interstellar (IS) CR fluxes are 
then calculated for this set of parameters (see, e.g., Fig. 1 
in Maurin et al. 2002 for further details of the propagation 
steps) . The IS flux is modulated with the force-field approx- 
imation and the resulting top-of-atmosphcre (TOA) spec- 
trum is compared with the data, which allows us to calcu- 
late the x 2 value [Eq. (13)], hence the likelihood [Eq. (12)]. 
This likelihood (in practice the log-likelihood) is used to 
compute the acceptance probability [Eq. (4)] of the trial 
vector of parameters Atrial (as generated by one of the three 
trial functions described in Sect. 3.3.2). Whether the trial 
vector is accepted or rejected implies whether 6i = Atrial 
or 9 1 = 6q. This procedure is repeated for the N steps 
of the chain. Obviously, when 9i+i = Oi, the propagation 
step does not need to be repeated. Because of the nature 
of the MCMC algorithm, several chains can be executed 
in parallel. Once completed, these chains are analysed (see 
Sect. 3.2) — discarding the first step belonging to the burn- 
ing length and thinned according to the correlation length 
I [Eq. (8)] — and combined to recover the desired posterior 
PDF P(0|data). 

In this procedure, the user must decide i) the data to 
be used, ii) the observable to retain in calculating the like- 
lihood, and iii) the number of free parameters m (of the 
vector 6) for which we seek the posterior PDF. 

4.2. Leaky-Box Model (LBM) 

The LBM assumes that all CR species are confined within 
the Galaxy with an escape rate that equals N/T esc , where 
the escape time r csc is rigidity-dependent, and is written 
as t csc (R). This escape time has two origins. First, CRs 
can leak out the confinement volume and leave the Galaxy. 
Second, they can be destructed by spallation on interstel- 
lar matter nuclei. This latter effect is parameterised by the 
grammage x (usually expressed in g cm -2 ), defined as the 
column density of interstellar matter encountered by a path 
followed by a CR. The CRs that reach Earth have followed 
different paths, and can therefore be described by a gram- 
mage distribution N(x) = dN/dx. The LBM assumes that 



N(x) oc exp- A « c(R ) x 



(16) 



2 A public version will be released soon (Maurin, in prepara- 
tion). 



where the mean grammage A csc (i?) = (x) is related to the 
mass m, velocity v and escape time r csc (i?) by means of 
X esc {R) = mnvT csc (R). 

The function A esc (-R) determines the amount of spalla- 
tions experienced by a primary species, and thus determines 
the secondary-to-primary ratios, for instance B/C. From an 
experimentalist point of view, A esc (-R) is a quantity that can 
be inferred from measurements of nuclei abundance ratios. 
The grammage A csc (i?) is known to provide an effective de- 
scription of diffusion models (Berezinskii et al. 1990): it can 
be related to the efficiency of confinement (which is deter- 
mined by the diffusion coefficient and to both the size and 
geometry of the diffusion volume), spallative destruction 
(which tends to shorten the average lifetime of a CR and 
thus lower A esc ), and a mixture of other processes (such as 
convection, energy gain, and losses). 

In this paper, we compute the fluxes in the framework 
of the LBM with minimal rcaccclcration by the interstellar 
turbulence, as described in Osborne & Ptuskin (1988) and 
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Seo & Ptuskin (1994). The grarnrnagc A CSC (_R) is parame- 
ter ised as 

\ csc (R) = { XoI3R ° {S ~ SO)R ~ S ° whcnii <- R o, 
1 \q[3R~ s otherwise; 

where we allow for a break, i.e. a different slope below and 
above a critical rigidity Rq. The standard form used in the 
literature is recovered by setting 5q = 0. For the entire set 
of n nuclei, a series of n equations (see Maurin et al. 2001 
for more details) for the differential densities Ni =1 <--' n are 
solved at a given kinetic energy per nucleon E k / n (E is the 
total energy), i.e. 



■ ■ d ( dN° 



S j (E k/n ) . (18) 



In this equation, the r.h.s. term is the source term that takes 
into account the primary contribution (see Sect. 4.3), the 
spallative secondary contribution from all nuclei k heavier 
than j, and the /3-decay of radioactive nuclei into j. The 
first energy-dependent factor A 3 is given by 



j ,+ISM 



1 



ISM=H,He 

The two other terms correspond to energy losses and first- 
order reacceleration for B 3 and to second-order reaccelera- 
tion for C 3 . Following Osborne & Ptuskin (1988) and Seo 
& Ptuskin (1994), 

dE 

B =(^)ion,coul+( 1 +^ 2EK PP and C = f3 4 E*K pp , 



where 



K 



4 -i ;2 T CSC 

3 a <5(4-<5 2 )(4-5) 



(19) 



The strength of the reacceleration is mediated by the 
pscudo Alfvcnic speed V a of the scatterers in units of 
km s _1 kpc" 1 . This is related to a true speed given in a 
diffusion model with a thin disk h and a diffusive halo L 
by means of V a = V a X (hL)- 1 / 2 (Seo & Ptuskin 1994). 
Assuming typical values of h = 0.1 kpc and L = 10 kpc, 
the value of V a can be directly transposed and compared 
to a true speed V a , as obtained in diffusion models. 

To summarise, our LBM with reacceleration may in- 
volve up to five free parameters, i.e. the normalisation Ao, 
the slopes Sq and S below or above the cut-off rigidity Rq, 
and a pseudo-Alfven velocity V a related to the reaccelera- 
tion strength. 

4.3. Source spectra 

We assume that the primary source spectrum Qj(E) for 
each nuclear species j is given by (/3 = v/c) 



Q j {E) = dQ j /dE = q j f3^R- 



(20) 



where qj is the source abundance, atj is the slope of the 
species j, and the term (3 r,j manifests our ignorance about 
the low-energy spectral shape. We further assume that 
Uj = a for all j, and unless stated otherwise, rjj = r) = —1 
in order to recover dQ/dp oc p~ a , as obtained from accel- 
eration models (e.g., Jones 1994). The constraints existing 
on r\ are explored in Sect. 5.3. 



The pattern of the source abundances observed in the 
cosmic radiation differs from that of the solar system. This 
is due to a segregation mechanism during the acceleration 
stage. Two hypotheses are disputed in the literature: one is 
based on the CR composition controlled by volatility and 
mass-to-charge ratio (Meyer et al. 1997; Ellison et al. 1997), 
and the other one is based on the first ionisation potential 
(FIP) of nuclei (e.g., Cassc & Goret 1973). In this work, for 
each configuration, the source abundances are initialised 
to the product of the solar system abundances (Lodders 
2003), and the value of the FIP taken from Binns et al. 
(1989). The final fluxes are obtained by an iterative calcu- 
lation of the propagated fluxes, rcscaling the element abun- 
dances — keeping fixed the relative isotopic abundances — to 
match experimental data at each step until convergence is 
reached (see Fig. 1 in Maurin et al. 2002 for further details). 
The result is thus insensitive to the input values (more de- 
tails about the procedure are given in App. B.l). 

The measurement of all propagated isotopic fluxes 
should characterise all source spectra parameters com- 
pletely, i.e. the qj and otj parameters should be free. 
However, only clement fluxes are available, which motivates 
the above rescaling approach. In Sect. 5.3, a few calcula- 
tions are undertaken to determine self-consistently, along 
with the propagation parameters, i) a and 77, and ii) the 
source abundances for the primary species C, O, and the 
mixed N elements (the main contributors to the boron flux) . 



5. Results 

We first examine the relative merits of four different pa- 
rameterisations of the LBM, and determine the statistical 
significance of adding more parameters. These models cor- 
respond to {0 a } a =i,. ..,m<b with 

— Model I = {A , Rq,8], i.e. no reacceleration (V a = 0) 
and no break in the spectral index (Sq = 0). 

— Model II = {A , 5, V a }, i.e. no critical rigidity (i?o = 0) 
and no break in the spectral index (Sq = 0). 

— Model III = {Ao, i?o, 8, V a }, i.e. no break in the spectral 
index (Sq = 0). 

- Model IV = {Ao, Rq, Sq, 5, V a }. 

Various subsets of B/C data are used to investigate whether 
old data arc useful or just add confusion to the PDF de- 
termination. We note in Sect. 5.2 that no useful constraint 
can be drawn from p data alone. 

We also consider additional free parameters (Sect. 5.3) 
related to the source spectra, for a self-consistent determi- 
nation of the propagation and source properties. Since we 
show that a break in the slope (Model IV) is not required 
by current data, we focus on Model III (for the description 
of the propagation parameters), defining: 

— Model III+l = {Ao, Rq, 5, V a } + {a}, where the source 
slope a is a free parameter. 

- Model III+2 = {A , R , S, V a } + {a,?y}, where both the 
source slope a and the exponent 77 [of /?, see Eq. (20)] 
are free parameters. 

- Model III+4 = {A , R , 5, V a } + {a, q c , <?n, qo}, where 
the abundances qi of the most significantly contributing 
elements are also free parameters. 

- Model III+5 = {A , Rq, S, V a } + {a, 77, q c , <?n, qo}- 
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This allows us to investigate the correlations between pa- 
rameters further and into potential biases in the propaga- 
tion parameter determination. 

More details about the practical use of the trial func- 
tions can be found in App. C. In particular, the sequential 
use of the three sampling methods (Gaussian step, covari- 
ance matrix step, and then binary-space partitioning) is 
found to be the most efficient: all results presented here- 
after are based on this sequence. 

5.1. Fitting the B/C ratio 
5.1.1. HEAO-3 data alone 

We first constrain the model parameters with HEAO-3 data 
only (Engclmann et al. 1990). These data are the most pre- 
cise data available at the present day for the stable nuclei 
ratio B/C of energy between 0.62 to 35GeV/n. 

The results for the models I, II, and III are presented 
in Figs. C.2, 2 top, and 2 bottom. The inner and outer 
contours are taken to be regions containing 68% and 95% of 
the PDF respectively (see App. A.l). The first observation 
one can make for the LBM without reacceleration (Model 
I, Fig. C.2), is that the marginal distributions of the three 
LBM parameters are mostly Gaussian. The tail for small 
values of Ro is due to this parameter being constrained by 
low-energy data (< lGeV/n): there are no HEAO-3 data 
at low energy, so all Ro values below 3 GV are equiprobable 
(this remains true for Model III). 

As seen in Fig. 2, a more complicated shape for the dif- 
ferent parameters is found for Model II (top panel), and 
even more so for Model III (bottom panel). This induces 
a longer correlation length (1.5 and 6.9 steps instead of 1 
step) and hence reduces the efficiency of the MCMC (75% 
for model II and 17% for model III). Physically, the correla- 
tion between the parameters, as seen most clearly in Fig. 2 
(bottom), is understood as follows. First, Ao, Ro, and 5 are 
positively correlated. This originates in the low-energy re- 
lation A csc tx \oRq S , which should remain approximately 
constant to reproduce the bulk of the data at GeV/n en- 
ergy. Hence, if Rq or i5 is increased, Ao also increases to 
balance the product. On the other hand, V a is negatively 
correlated with 6 (and hence with all the parameters): this 
is the standard result that to reach smaller 5 (for instance 
to reach a Kolmogorov spectrum), more reacceleration is 
required. This can also be seen from Eq. (19), where at con- 
stant t csc , K pp oc V%/f(S), where / is a decreasing function 
of 6: hence, if S decreases, f(8) increases, and V a then has 
to increase to retain the balance. 

The values for the maximum of the PDF for the propa- 
gation parameters along with their 68% confidence inter- 
vals (see App. A) are listed in Table 1. The values ob- 
tained for our Model I are in fair agreement with those 
derived by Webber et al. (1998), who found {Ao, Ro, 5} = 
{38.27, 3.6, 0.7}. The difference for A could be related 
to the fact that Webber et al. (1998) rely on a mere 
eye inspection to extract the best-fit solution or/and use 
a different set of data. For example, comparing Model I 
with a combination of HEAO-3 and low-energy data 
(ACE+Voyager 1 & 2+IMP7-8, see Sect. 5.1.2) leads to 
{Ao, Ro, S} = {52, 5.3, 0.69}, slightly changing the values 
of the Model I preferred parameters (compared to the first 
line of Table 1). 




20 30 40 0.5 0.55 0.6 




20 30 40 0.5 0.55 0.6 60 80 100 




Fig. 2. Posterior distributions for Model II (top) and Model III 
(bottom) using HEAO-3 data only. For more details, refer to 
caption of Fig. C.2. 

The reacceleration mechanism was invoked in the liter- 
ature to decrease the spectral index S toward its preferred 
value of 1 /3 given by a Kolmogorov spectrum of turbulence. 
In Table 1 , the estimated propagation parameter values for 
the models II and III arc indeed slightly smaller than for 
Model I, but the Kolmogorov spectral index is excluded for 
all of these three cases (using HEAO-3 data only). This 
result agrees with the findings of Maurin et al. (2001), in 
which a more realistic two-dimensional diffusion model with 
reacceleration and convection was used. We note that the 
values for V a ~ 80 km s _1 kpc -1 , should lead to a true 
speed V a = V a X \fhL ~ 80 km s -1 in a diffusion model for 
which the thin disk half-height is h = 0.1 kpc and the halo 
size is L = 10 kpc: this is consistent with values found in 
Maurin et al. (2002). 
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9 tj+0.6 


7n+ 01 

u - ' u -0.01 
U - 00 -0.06 


75ti° 


3.35 
1.43 
1.30 



Table 1. Most probable values of the propagation parameters 
(after marginalising over the other parameters) for models I, II, 
and III using HEAO-3 alone (14 data points) and the B/C con- 
straint. The uncertainty in the parameters correspond to 68% 
CL of the marginalised PDF (see App. A). The last column 
shows the minimum x 2 /dof obtained for each model (the asso- 
ciated best-fit parameters are gathered in Table 3). 
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Fig. 3. Best-fit ratio for Model I (blue dotted), II (red dashed), 
and Model III (black solid) using the HEAO-3 data only (green 
symbols). The curves are modulated with $ = 250 GV. The 
corresponding best-fit parameters are gathered in Table 3. 

The final column in Table 1 indicates, for each model, 
the best \ 2 value per degree of freedom, x m i n /dof. This al- 
lows us to compare the relative merit of the models. LB 
models with reacceleration reproduce the HEAO-3 data 
more accurately (with % 2 /dof of 1.43 and 1.30 for the 
Models II and III respectively compared to x 2 /dof = 4.35 
for Model I). The best-fit model B/C fluxes are shown 
with the B/C HEAO-3 data modulated at $ = 250 MV 
in Fig. 3. Physically, the origin of a cutoff Rq in A osc at 
low energy can be related to convection in diffusion models 
(Jones 1979). Hence, it is a distinct process as reacceler- 
ation. The fact that Model III performs more successfully 
than Model II implies that both processes are significant, 
as found in Maurin et al. (2001). 

In the following, we no longer consider Model I and II, 
and inspect instead, the parameter dependence of Model 
III on the dataset selected. 

5.1.2. Additional constraints from low-energy data 

The actual data sets for the B/C ratio (see, e.g., Fig. 5) 
show a separation into two energy domains: the low-energy 
range extends from ~ 10 -2 GeV/n to ~ 1 GeV/n and the 
high-energy range goes from ~ 1 GeV/n to ~ 10 2 GeV/n. 
The spectral index S is constrained by high-energy data, 



Model 
Dataset 


A 
g cm~ 2 


Ro 
GV 


5 


Va 
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Xmin/dof 

-1 


III-A 
III-B 
III-C 
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III-D 

III-E 
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q 7+0.2 
°- ' -0.3 


u - dz -0.02 
u - d ' -0.02 


95±^ 
88t 3 6 


4.15 
6.08 



Table 2. Same as in Table 1, but testing different data sets 
with Model III: A = HEAO-3 (14 data points), B = HEAO-3 + 
ACE (20 data points), C = HEAO-3 + ACE + Voyager 1 & 2 + 
IMP7-8 (22 data points), D = HEAO-3 + all low-energy data 
(30 data points), E = all B/C data (69 data points). 



e.g., the HEAO-3 data, and adding low-energy data allows 
us to more reliably constrain Rq. We note that by fitting 
only the low-energy data, only the grammage crossed in 
very narrow energy domain would be constrained. 

In a first step, we add only the ACE (CRIS) data (de 
Nolfo ct al. 2006), which covers the energy range from 
- 8 • 10" 2 GeV/n to - 2 • 10 _1 GeV/n, and which is later 
referred to as dataset B (the dataset A being HEAO-3 data 
alone). The resulting posterior distributions are similar for 
the datasets B and A (B is not shown, but A is given in 
Fig. 2, bottom). Results for datasets A and B are com- 
pletely consistent (first and second line of Table 2) , but for 
the latter, propagation parameters are more tightly con- 
strained and the fit is improved (x 2 n ; n /dof=1.09). The ACE 
(CRIS) data are compatible with Rq = 0, but the preferred 
critical rigidity is 2.47 GV. 

All other low-energy data (ISEE-3, Ulysses, IMP7-8, 
Voyager 1& 2, ACE) are then included (dataset D). The 
resulting values of the propagation parameters are left un- 
changed. However, a major difference lies in the higher 
Xmin/dof of 4.15, which reflects an inconsistency between 
the different low-energy data chosen for the MCMC. If 
the data point from the Ulysses experiment is excluded, 
X 2 nin /dof decreases to a value of 2.26, and by excluding 
also the ISEE-3 data points (dataset C) it decreases fur- 
ther to 1.06 (see Table 2). Since the set of low-energy 
data have different modulation parameters, the difference 
in the results for the various data subsets becomes clearer 
after the data have been demodulated. The force-field ap- 
proximation provides a simple analytical one-to-one corre- 
spondence between the modulated top-of-the atmosphere 
(TOA) and the demodulated interstellar (IS) fluxes. For an 
isotope x, the IS and TOA energies per nucleon are related 
by E l k s = i?J OA + <I> ($ = Z/Ax 4> is the modulation param- 
eter), and the fluxes by (p x is the momentum per nucleon 
of x) 

JfrA) ^ TOA K TOA + m >< *) • (21) 

Px / 

The B/C ratio results from a combination of various iso- 
topes, and assuming the same Z/A for all isotopes, we find 
that 
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Fig. 4. HEAO-3 and ACE (CRIS) modulated (TOA, solid line) 
and demodulated (IS, dashed line) data points have been con- 
nected to guide the eye. Filled symbols (modulated and de- 
modulated) correspond to HEAO-3, ACE (CRIS), IMP7-8 and 
Voyager 1 & 2. On the TOA curve, the empty red stars and the 
blue upper triangle correspond to ISEE-3 and Ulysses. 
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Fig. 5. Best-fit B/C flux (Model III) for datasets A (thin red 
curves) and C (thick black curves). Above 300 MeV/n, B/C is 
modulated to $ = 250 MV (solid lines) appropriate for HEAO-3 
data, whereas below, it is modulated to $ = 225 MV (dashed 
lines). Model III-B, not shown, overlaps with III-C. The corre- 
sponding propagation values are gathered in Table 3. 
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Table 3. Best-fit values (corresponding to Xmin) 
for B/C data (A=HEAO-3 data alone, C=HEAO- 
3+Voyager 1 & 2+ACE+IMP7-8). 



Voyager 1 & 2 constraints slightly shifts all of the parame- 
ters to lower values. 

In a final try, we take into account all available data 
(dataset E, final line of Table 2). Many data are clearly in- 
consistent with each other (see Fig. 8), but as for the low- 
energy case, although the Xmin/dof is worsened, the pre- 
ferred values of the propagation parameters are not changed 
drastically (compare with datasets B, C, and D in Table 2). 
We await forthcoming data from CREAM, TRACER, and 
PAMELA to be able to confirm and refine the results for 
HEAO-3 data. 



The modulated and demodulated low-energy B/C data are 
shown in Fig. 4 (see caption for details). The ISEE-3 and 
Ulysses data points, as just underlined, are clearly incon- 
sistent with other data. To be consistent, $ = 200 MV for 
ISEE-3 and $ = 200 MV for Ulysses would be required. 
Significant uncertainties A$ ^25 — 50 GV arc quoted in 
general, so that it is difficult to conclude whether there 
are systematics in the measurement or if the modulation 
quoted in the papers is inappropriate. Some experiments 
have also accumulated the signal for several years, peri- 
ods during which the modulation changes. It is beyond the 
scope of this paper to discuss this issue further. Below, we 
discard both ISEE-3 and Ulysses data in selecting an ho- 
mogeneous low-energy data set, which includes the most 
recent ACE (CRIS) data. 

The resulting best-fit models, when taking low-energy 
data into account, are displayed in Fig. 5. The B/C best-fit 
ratio is displayed for Model III and the dataset A (red thin 
lines) and C (black thick lines). Model III-B (not shown) 
yields similar results to Model III-C. Solid and dashed lines 
correspond to the two modulations $ = 250 MV (HEAO- 
3 and IMP7-8) and $ = 225 MV respectively (ACE and 
Voyager 1& 2). Although the fit from HEAO-3 alone pro- 
vides a good match at low energy, adding ACE (CRIS) and 



5.1.3. Model IV: break in the spectral index 

We have already mentioned that the rigidity cut-off may be 
associated with the existence of a galactic wind in diffusion 
models. By allowing a break to occur in the spectral index 
of A csc [sec Eq. (17)], we search for a deviations from a 
single power law (So — S) or from the cut-off case (Sq = 0). 

Adding a new parameter So (Model IV) increases the 
correlation length of the MCMC, since Rq and <5o are corre- 
lated [see Eq. (17)]. The acceptance /i n d [Eq. (9)] is hence 
extremely low. For Model IV-C (i.e. using dataset C, see 
Table 2), we find / ind = 2%. The PDF for S is shown 
in the left panel of Fig. 6. The most probable values and 
68% confidence intervals obtained are {Ao, Rq, Sq, S, V a } = 

{30±|, 2.218:6, -0.6l?:l,0.55lg:ra. 7 ®±li}, which are con- 
sistent with values found for other models, as given in 
Tables 1 and 2: adding a low-energy spectral break only al- 
lows us to better adjust low-energy data (figure not shown). 
The best-fit parameters, for which Xmin = 0.86, are re- 
ported in Table 3. The small value of Xmin (smaller than 
1) may indicate an over-adjustment, which would disfavour 
the model. 

It is also interesting to compel Sq to be positive, 
to check whether So — (equivalent to Model III), 
So = S (equivalent to Model II), or any value in-between 
that is preferred. We find the most probable values to 
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Fig. 6. Marginalised PDF for the low-energy spectral index <5o 
in Model IV-C. The parameter So is either free to span both 
positive and negative values (left panel) or constrained to <5o > 
(right panel). 



be {\ ,R ,S ,S,V a } = {23^, 1+ 2 , 0+^,0.49^,102+^}. 
The corresponding PDF for So is shown in the right panel 
of Fig. 6. The maximum occurs for So = 0, which is also 
found to be the best-fit value; we checked that the best-fit 
parameters matches those given in Table 3 for Model III- 
C. A secondary peak appears at So ~ 0.5, such as So ~ S 
corresponding to Model II. The associated Xmin f° r this 
configuration is worse than that obtained with So = 0, in 
agreement with the conclusion that Model III provides a 
closer description of the data than Model II. 



5.1.4. Summary and confidence levels for the B/C ratio 

In the previous paragraphs, we have studied several models 
and B/C datasets. The two main conclusions that can be 
drawn are i) the best-fit model is Model III, which includes 
reacceleration and a cut-off rigidity, and ii) the most likely 
values of the propagation parameters are not too dependent 
on the data set used, although when data are inconsistent 
with each other the statistical interpretation of the good- 
ness of fit of a model is altered (all best-fit parameters are 
gathered in Table 3) . The values of the derived propagation 
parameters are close to the values found in similar studies 
and the correlation between the LB transport parameters 
are well understood. 

Taking advantage of the knowledge of the x 2 distribu- 
tion, we can extract a list of configurations, i.e. a list of 
parameter sets, based on CLs of the \ 2 PDF (as explained 
in App. A. 2). The x 2 distribution is shown for our best 
model, i.e. Model III, in Fig. 7. The red and black areas 
correspond to the 68% and 95% confidence intervals, which 
arc used to generate two configuration lists, from which 
68% and 95% CLs on, e.g., fluxes, can be derived 3 . 

The B/C best-fit curve (dashed blue), the 68% (red 
solid), and 95% (black solid) CL envelopes are shown in 
Fig. 8. For the specific case of the LBM, this demonstrates 
that current data arc already able to constrain strongly the 
B/C flux (as reflected by the good value Xmin = 1-06), even 
at high energy. This provides encouraging support in the 
discriminating power of forthcoming data. However, this 
conclusion must be confirmed by analysis of a more refined 



For instance, it can be used to predict the p or d background 
flux to look for a dark- matter annihilating contribution, e.g., as 
in Donato et al. (2004). Note however that the statistical proce- 
dure described here is far more robust than the crude approach 
used by Maurin et al. (2001); Donato et al. (2004). 
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Fig. 7. x 2 /dof normalised distribution for Model III-C. The 68% 
and 95% CL of the distribution are shown respectively as the 
red and black area. 
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Fig. 8. Confidence regions of the B/C ratio for Model III-C as 
calculated from all propagation parameters satisfying Eq. (A. 2). 
The blue-dashed line is the best-fit solution, red-solid line is 
68% CL and black-solid line 95% CL. Two modulation param- 
eters are used: $ = 225 MV below 0.4 GeV/n (adapted for 
ACE+ Voyager 1 & 2+IMP7-8 data) and $ = 250 MV above 
(adapted for HEAO-3 data). 



model (e.g., diffusion model), for which the situation might 
not be so simple. 

From the same lists, we can also derive the range al- 
lowed for the source abundances of elements (we did not 
try to fit isotopic abundances here, although this can be 
achieved, e.g., as in Simpson & Connell 2001 and refer- 
ences therein). The element abundances are gathered in 
Table 4, for elements from C to Si (heavier elements were 
not used in this study) . They can be compared with those 
found in Engelmann et al. (1990) (see also those derived 
from Ulysses data, Duvernois et al. 1996). For some ele- 
ments, the agreement is striking (F, Mg), and is otherwise 
fair. The difference for the main progenitors of boron, i.e. 
C, N, and O, is a bit puzzling, and is probably related 
to a difference in the input-source spectral shape. This is 
discussed further in Sect. 5.3, where we also determine self- 
consistcntly the propagation parameters along with the C, 
N, and O abundances. 
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Z Element 



(m 3 s GeV/n) 



HEAO-3 
(Engelmann et al. 



6 


C 


148.5 ± 


3. 


164.9 ± 


4.7 


7 


N 


8.1 ± 


0.6 


9.9 ± 


3.4 


8 





185. ± 


3. 


204. ± 


2.2 


9 


F 


3.67 ± 


0.05 


3.67 ± 


0.05 


10 


Ne 


24.1 ± 


0.4 


22.5 ± 


1.3 


11 


Na 


1.88 ± 


0.08 


1.25 ± 


0.5 


12 


Mg 


40.3 ± 


0.6 


40.3 ± 


1.0 


1.3 


Al 


3.69 ± 


0.1 


3.02 ± 


0.6 


14 


Si 


38.8 ± 


0.5 


38.8 ± 


0.5 



Table 4. Element source abundances, qz = 2~^i=i B ot 9*> f° r 
Model III-C (isotopic fractions are fixed to SS ones, Lodders 
2003). The central values correspond, for the best-fit model, to 
abundances rescaled to match HEAO-3 data at 10.6 GeV/n. The 
uncertainty in qi originates from the same rescaling, but arising 
from all combinations of parameters satisfying the 68% CL on 
the x 2 distribution. For HEAO-3, the numbers are taken from 
Table 7 of Engelmann et al. (1990), and have been rescaled to 
gz(Si) = 38.8 x 10~ 22 (m 3 s GeV/n) -1 to ease the comparison. 




Ek (GeV) 



Fig. 9. Demodulated anti-proton data and IS flux for Model 
(best fit on p data, red-dashed line) and for Model III-C (from 
the best-fit parameters on B/C data, black-solid line). 



5.2. Constraints from p 

In the context ofindirect dark-matter searches, the anti- 
matter fluxes (p, d and e + ) are used to look for exotic con- 
tributions on top of the standard, secondary ones. 

The standard procedure is to fit the propagation pa- 
rameters to B/C data, and apply these parameters in cal- 
culating the secondary and primary (exotic) contributions. 
The secondary flux calculated for our best-fit Model III-C is 
shown, along with the data (see App. B.2 for more details) 
in Fig. 9 (black-solid line). For this model, we can calcu- 
late the x 2 value for the p data, and we find x 2 /dof=1.86. 
The fit is not perfect, and as found in other studies (e.g., 
Dupcrray et al. 2005), the flux is somehow low at high en- 
ergy (PAMELA data are awaited to confirm this trend). 
However, we checked that these high-energy data points 
are not responsible for the large x 2 value. The latter could 
be attributed to either a small exotic contribution, a differ- 
ent propagation history for species for which A/ Z = 1 or 
i/2 k 2, or inaccurate data. 

It may therefore appear reasonable to fit directly the 
propagation parameters to the p flux, assuming that it is 
a purely secondary species. Since the fluxes of its progeni- 
tors (p and He) are well measured, this should provide an 
independent check of the propagation history. We first at- 
tempted to apply the MCMC method to the p data with 
Model III, then Model II and finally Model I. However, 
even the simplest model exhibits strong degeneracies, and 
the MCMC chains could not converge. We had to revert to a 
model with no reacceleration (V a = 0), no critical rigidity 
(Rq = 0), and no break in the spectral index (5q = 0), 
for which A csc = \ (3(R/1GV)- 5 (hereafter Model 0). 
The lcr values found for the two parameters {Ao,<5} are 

x p, Model = 10 . 2 +° : 5g. cm -2 and gp, Model = .00+ 04 . 

Hence, only one parameter (Ao) is required to reproduce the 
data, as seen in Fig. 9 (red-dashed line, Xmin/dof=1.128). 
This is understood as follows: due to the combined effect 
of modulation and the tertiary contribution (p inclastically 
interacting on the ISM, but surviving as a p of lower en- 
ergy), the true low-energy data points all correspond to p 
produced at a few GeV energies. Due to the large scattering 



in the data, it is sufficient to produce the correct amount 
of p at this particular energy to account for all of the data. 

Due to the importance of antimatter fluxes for indirect 
dark-matter searches, this novel approach could be helpful 
in the future. However, this would require a more robust 
statistics of the p flux, especially at higher energy, to lift 
the degeneracy in the parameters. 

5.3. Adding free parameters related to the source spectra 

In all previous studies (e.g., Jones et al. 2001), the source 
parameters were investigated after the propagation param- 
eters had been determined from the B/C ratio (or other 
secondary to primary ratio). We propose a more general ap- 
proach, where we fit simultaneously all of the parameters. 
With the current data, this already provides strong con- 
straints on the CR source slope a and source abundances 
(CNO). Higher-quality data are awaited to refine this anal- 
ysis. We also show how this approach can help to uncover 
inconsistencies in the measured fluxes. 

For all models below, taking advantage of the results 
obtained in Sect. 5.1, we retain Model III-C. The roman 
number refers to the free transport parameters of the model 
(111= {Ao, Ro, S, V a }), and the capital refers to the choice 
of the B/C dataset (C=HEAO-3+Voyager 1 & 2+IMP7-8, 
see Table 2). This is supplemented by source spectra pa- 
rameters and additional data for the clement fluxes. 



5.3.1. Source shape a and -q from Eq. (20) 

As a free parameter, we first add a universal source slope a. 
We then allow r\, paramcterising a universal low-energy 
shape of all spectra, to be a second free parameter. In ad- 
dition to B/C constraining the transport parameters, some 
primary species must be added to constrain a and r\. We 
restrict ourselves to O, the most abundant boron progen- 
itor, because it was measured by both the HEAO-3 ex- 
periment (Engelmann et al. 1990), and also the TRACER 
experiment (Ave et al. 2008). The modulation levels were 
$ = 250 MV for HEAO-3 and $ = 500 MV for TRACER. 
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Fig. 10. B/C ratio from best-fit models of Table 6. In addition 
to the propagation parameters, free parameters of the source 
spectra are a (thin lines, labeled 1) or a and rj (thick lines, 
labeled 2): The two models are tested on two datasets for the 
primary flux (in addition to using the B/C HEAO-3 data): O is 
as measured by HEAO-3 (black lines, dataset a) or as measured 
by TRACER (blue lines, dataset b). 



Wc cstmatcd the latter number from the solar activity at 
the time of flight (2 weeks in December 2003) as seen from 
neutron monitors data . 

In total, we test four models (denoted by la, lb, 2a, and 
2b for legibility): 



III-C+la 
III-G+lb 
III-C+2a 
III-C+2b 



{A , R , 6, V a } + {a}, with 0=HEAO-3; 
{A , R , 5, V a } + {a}, with 0=TRACER; 
{A , Ro, S, V a } + {a, ??}, with 0=HEAO-3; 
{A , R , S, V a } + {a, 77}, with O^TRACER; 



where the Arabic numbers relate to the source-spectrum 
free parameters used in the calculation, and the lower 
case relates to the chosen oxygen-flux dataset (a=HEAO-3, 
b=TRACER). The most probable parameters are gathered 
in Table 5, where, to provide a comparison, the first line re- 
ports the values found for Model III-C (i.e. with 7 = a + S 
fixed to 2.65). We remark that by adding HEAO-3 oxygen 
data to the fit (la), the propagation parameters Ao, Ro, and 
S overshoot Model III-C's results, while they undershoot 
those of Model lb (TRACER data). The parameter V a un- 
dershoots and overshoots for these two models respectively, 
since it is anti-correlated with the former parameters. As a 
consequence, the fit to B/C is worsened, especially at low 
energy (see Fig. 10). 

The top left panel of Fig. 11 shows the slopes a derived 
for Models la (solid black) and lb (dashed blue). In both 
cases, a is well constrained, but the values are inconsis- 
tent, a result that is clear because the low-energy data are 
also inconsistent: the demodulated (i.e. IS) HEAO-3 and 
TRACER oxygen data points are shown in the right panel 
of Fig. 11. To remedy this situation, we allow 77 to be a free 
parameter (family of models III+2). The net effect is to ab- 
sorb any uncertainty originating in either the modulation 
level or the source-spectrum low-energy shape. As shown 
in the bottom panel of Fig. 11, the source slopes derived 



http: / /ulysses. sr. unh.edu/NeutronMonitor/Misc/neutron2. html. 
Indeed, the solar activity between 2002 and 2004 has not varied 
much, so that we use a value for $ derived from the BESS 2002 
flight (see Fig. 2 of Shikaze et al. 2007). 



III-C+1_: {X , R 8, V ,a) 



a 



— la (0=HEAO-3) 
lb (0=TRACER) 
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—1 


(demodulated) 




: 


, , 1 
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Fig. 11. Marginalised PDF for models III-C+1 (a free param- 
eter, top left) and III-C+2 (a and 77 free parameters, bot- 
tom panels). In the three panels, the solid-black lines rely on 
HEAO-3 oxygen data, whereas the blue-dashed lines rely on 
TRACER oxygen data (thin and thick lines are as in Fig. 10). 
Top right panel: zoom in demodulated O low energy HEAO- 
3 and TRACER data. (The solid segments on TRACER data 
show the energy bin size. Uncertainty on all fluxes are present, 
but too small to notice). 



from the two experiments are now in far closer agreement 
(bottom left), with a ~ 2.15. The most probable values 
and the best-fit model values are given in Tables 5 and 
6 respectively. The effect of this action is evident in the 
low-energy slope of the source spectrum -q. As seen in the 
bottom-right panel, the two data sets contain significantly 
inconsistent ranges. The value 77tracer — —6.7 probably 
indicates that the solar modulation we chose was incorrect. 
The value ?7heao-3 — 0.3 might provide a reasonable guess 
of the low-energy shape of the source spectrum, but might 
also be a consequence of systematics in the experiment. The 
associated oxygen fluxes are shown in Fig. 12 for the best- 
fit models: as explained, models that allow 7/ to vary (thick 
lines) reproduce more accurately the data than when 7/ is 
set to be -1 (thin lines). 

Although it would be precipitate to draw any firm con- 
clusion about the low-energy shape, we can turn the argu- 
ment around to serve as a diagnosis of the low-energy data 
quality. For instance, assuming that the spectral index a of 
all elements was the same, extracting and comparing r\i for 
each of these i elements may enable us to infer some sys- 
tematics remaining in the data. It would be worth fitting 
the H and He species, which are the most reliably measured 
fluxes to date; this will be considered in a future study using 
diffusion models. 



5.3.2. a, 77 and source normalisation qi 

The final two models add, as free parameters, the CNO 
element source abundances (relative isotopic abundances 
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* III-C: propagation parameters are {Ao, Ro, S, V a } and the B/C dataset 
f Abundances are 1.65|0.10|2.04 for HEAO-3 (Engelmann et al 


is HEAO-3+Voyagerl&2+IMP7-8. 
1990, and see Table 4). 



Table 5. Most probable values of the propagation parameters (after marginalising over the other parameters) for models 

III-C+ The additional free parameters and data correspond to: la/lb ={q}, 2a/2b={a, rj}, 4a/4b ={a, qc, qN, qO}, 

5a/5b={a, 77, qc, qN, qO} with either O data is HEAO-3 (a) or TRACER (b). The uncertainty on the parameters correspond to 
68% CL of the marginalised PDF (see App. A). The associated best-fit parameters are gathered in Table 6. 




Fig. 12. Same models as in Fig. 10, but for the oxygen flux. 



are fixed to SS ones). The data used in the fit are B/C, C, 
N, and O, all measured by HEAO-3 (TRACER data for C 
and N have not yet been published). The models, which are 
denoted by short 4a and 5a in the text below, are: 

- III-C+4a: {A , Ro, 6, V a } + {a,q c , qN, qo}; 

- III-C+5a: {A , Ro, S, V a } + {a, r\,q c , qN, qo}- 

The PDF and 2D correlations plots between propaga- 
tion and source parameters are seen in Fig. 13. With nine 
parameters, the efficiency is very low (find ^ 0.05%), even 
using the BSP trial function. To obtain ~ 800 independent 
points, a total 1.6 • 10 6 steps were completed. The contours 
are not as regular as for our 4-parameter model (see Fig. 2), 
but correlations between the parameters are still clearly 
evident: we recover the Ao — Ro — S correlations (and anti- 
correlation with V a ). In addition, we note that all source- 
related parameters (a, t] and element abundances qc,N,o) 
are correlated among themselves and with V a (hence anti- 
correlated with the remaining transport parameters). This 
is especially visible for the primary species C and O, while 
less clear for the mixed species N. This is understood as the 
parameter 7, i.e. the slope of the propagated primary fluxes 
(7 = a + S) is mostly fixed by measurements: if we decrease 
5, then a must be increased to match the data. However, 



if the source slope is increased, the exponent 77 must also 
increase to match the low-energy data. The positive corre- 
lation between source abundances comes from the fact that 
relative fluxes should be preserved. 

The most probable values are gathered in Table 5. 
Compared with the respective Models la and 2a, leav- 
ing the source abundances qc, qN and qo free in 4a and 
5a does not significantly change our conclusions. Again, 
adding 77 (2a and 5a) as a free parameter allows us to ab- 
sorb the low-energy uncertainties in the data, so that we 
obtain a = 2.17 (5a) instead of the biased value of 2.13 
(4a). The same conclusions hold for other propagation pa- 
rameters. On the derived source abundances, the impact of 
adding the parameter 77 is for them to increase. The relative 
C:N:0 abundances (0= 1) are respectively 0.78 : 0.36 : 1 
(4a) and 0.82 : 0.40 : 1 (5a), the second model providing 
values slightly closer to those derived from HEAO-3 data 
0.81 : 0.49 : 1. 

The difference in the source element abundances when 
they are rescaled to match the data or including them in 
the MCMC is also seen from Table 6, which gathers the 
best-fit parameters. The next-to-last line reproduces qc,N,o 
obtained for all models: all abundances arc roughly in agree- 
ment, although our approach underlines the importance of 
taking the correlations between the parameters properly 
into account in extracting unbiased estimates of the prop- 
agation and source parameters. 

The goodness of fit for the models when applied to the 
B/C, C, N, and O data is shown in the last column of 
Table 6, in terms of the Xmin va l uc - The models in which 
qc,N,o is free do not provide a closer match between models 
and data but also a no poorer fit than when qc.N.o is fixed. 
As soon as primary fluxes are included in the fit (compared 
to Model III-C), the Xmin ^ s worsened. This is due to a 
combination of an imperfect fit to the primary fluxes and, 
as already said, a poorer B/C fit because the propagation 
parameters are optimised to match the former rather than 
the latter (B/C). The best-fit parameters are given in the 
same Table, and the associated CNO fluxes are plotted in 
Fig. 14 for illustration purposes. 
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Fig. 13. PDF (diagonal) and 2D correlations (off-diagonal) plots for the nine free parameters of Model III-C+5 when fitted on 
B/C, and CNO HEAO-3 data. 
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* These values are not extracted from the PDF: they are values of rescalcd abundances required to match HEAO-3 CNO data at 10.6 GeV/n. 
' For a comparison. HEAO-3 abundances (Engclmann et al. 1990, and see Table 4) arc 1.65|0.10|2.04. 

Table 6. Best-fit values (corresponding to Xmm) f° r au models as given in Table 5. Number of data points for the Xmm/dof 
calculation: 22 B/C data (III-C=HEAO-3+ Voyager 1 & 2+IMP7-8) plus 14 oxygen HEAO-3 data for la and 2a, 8 oxygen TRACER 
data for lb and 2b, or 14 x 3 (C, N and O) HEAO-3 data for 4a and 5a. 



5.3.3. Perspective on source spectrum parameters 

Other primary species could have been included in the \ 2 
calculation to i) constrain further a, and/or ii) to check 
the hypothesis ev^ 7^ <x,- for different species, and/or iii) di- 
agnose some problems in the data, if we believe the slope 
should be universal. However, using a few primary species 
(O or CNO) already affects the goodness of fit of B/C (com- 
pare model ni-C to others in Fig. 10). Since there are many 
more measured primary fluxes than secondary ones, taking 
too many primaries would weigh too significantly in the 



X 2 , compared to the B/C contribution, and this would di- 
vert the MCMC into regions of the parameter space that 
fit these fluxes rather than B/C. Since systematic errors 
are known to be larger in flux measurements than in ratios, 
this may lead to biased estimates of the propagation pa- 
rameters. Allowing j] to be a free parameter is a first step 
to decrease the impact of this bias (in the low-energy part 
of the spectrum). These biases are not necessarily an issue 
since we may be more interested in estimates of the source 
parameters rather than in unbiased value of the propaga- 
tion parameters. 
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Fig. 14. Carbon, nitrogen and oxygen fluxes from best-fit mod- 
els of Table 6. 

To illustrate the difficulty in data systematics, it suffices 
to say that for the most abundant species H and He, all ex- 
periments provided mutually incompatible measurements, 
until AMS and BESS experiments flew ten years ago. We 
cannot therefore expect HEAO-3 data, acquired in 1979, 
to be completely free of such drawbacks. Again, we await 
the publication of several forthcoming new data sets before 
pursuing our analysis further in this direction. 

6. Conclusion 

We have implemented a Markov Chain Monte Carlo to 
extract the posterior distribution functions of the propa- 
gation parameters in a LBM. Three trial functions were 
used, namely a standard Gaussian step, an N-dimensional 
Gaussian step and its covariance matrix, and a binary-space 
partitioning. For each method, a large number of chains 
were processed in parallel to accelerate the PDF calcula- 
tions. The three trial functions were used sequentially, each 
method providing some inputs to the next: while the first 
one was good at identifying the general range of the prop- 
agation parameters, it was not as efficient in providing an 
accurate description of the PDF. The two other methods 
provided this accuracy, and the final results were based on 
PDF obtained from the chains processed with the binary- 
space partitioning. 

Taking advantage of the sound statistical properties of 
the MCMC, confidence intervals for the propagation pa- 
rameters can be given, as well as confidence contours for 
all fluxes and other quantities derived from the propagation 
parameters. The MCMC was also used to compare the im- 
pact of choosing different datasets and ascertain the merits 
of different hypotheses concerning the propagation models. 
Concerning the first aspect, we have shown that combining 
different B/C datasets leaves mostly unchanged the propa- 
gation parameters, while strongly affecting the assessment 
of the goodness of a model. We also show that at present, 
the p data do not cover a sufficiently large energy range to 
constrain the propagation parameters, but they could be 
useful in crosschecking the secondary nature of the flux in 
the future. 



In this first paper, we have focused on the phenomcno- 
logically well-understood LBM, to ease and simplify the dis- 
cussion and implementation of the MCMC. In agreement 
with previous studies, we confirm that a model with a rigid- 
ity cutoff performs more successfully than one without and 
that reacceleration is preferred over no reacceleration. Such 
a model can be associated with a diffusion model with wind 
and reacceleration. As found in Maurin et al. (2001), the 
best-fit models demand both a rigidity cutoff (wind) and 
reacceleration, but do not allow us to reconcile the diffu- 
sion slope with a Kolmogorov spectrum for turbulence. An 
alternative model with two slopes for the diffusion was used, 
but it is not favoured by the data. In a last stage, we allowed 
the abundance and slope of the source spectra to be free 
parameters, as well as the element abundances of C, N, and 
O. This illustrated a correlation between the propagation 
and source parameters, potentially biasing the estimates of 
these parameters. The best-fit model slope for the source 
abundances was a « 2.17 using HEAO-3 data, compatible 
with the value a w 2.14 for TRACER data. The MCMC 
approach allowed us to draw confidence intervals for the 
propagation parameters, the source parameters, and also 
for all fluxes. 

A wealth of new data on Galactic CR fluxes are ex- 
pected soon. As illustrated for the LBM, the MCMC is a ro- 
bust tool in handling complex data and model parameters, 
where one has to fit simultaneously all source and propa- 
gation parameters. The next step is to apply this approach 
to more realistic diffusion models and larger datasets, on a 
wider range of nuclear species. 
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Appendix A: Best fit, goodness of a model, most 
probable values and confidence levels/intervals 

The best-fit model parameters are given by a unique set 
of parameters for which the \ 2 value of Eq. (13) is mini- 
mized; the goodness of fit of a model is given by Xmin/dof- 
On the other hand, the most probable value for each pa- 
rameter 0i is defined as the maximum , p? nax = "P(0' nax ) of 
its PDF (after marginalising). The most probable # max and 
best-fit model parameters Q hest do not necessarily coincide, 
especially when correlations exist between parameters. The 
best-fit model parameters are best suited to providing the 
most likely CR fluxes, whereas the ID marginalised PDF 
provides directly the most likely value of the parameter. 

A.l. Confidence levels/intervals on parameters 

Confidence intervals (CI), associated with a confidence level 
(CL), are constructed from the PDF. The asymmetric in- 
terval A x = [0f ax - 6~, 0f ax + 9+] such as 

CL(x)= / V(8 i )&6 i = \- 1 , (A.l) 

defines the 1 — 7 confidence level (CL), along with the CI of 
the parameter 0j. Here, the CIs (i.e 6~ and 0+) are found 
by decreasing the value V(0i) from "P 4 max to Vf, such that 
1 — 7 = x. This is easily generalised to 2D confidence levels 
in constructing 2D confidence intervals as shown later in 
correlation plots. Below, we use the x — 68% and x = 95% 
CLs, corresponding to la and 2er uncertainties. 
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A. 2. Confidence intervals on fluxes 

The best-fit model fluxes (e.g., B/C, O, p) are calcu- 
lated from the best-fit model parameters. Confidence lev- 
els in these quantities cannot be obtained from the ID 
marginalised CIs of the parameters. They must be con- 
structed from a sampling of the (still) correlated param- 
eters. This is achieved by using all sets of parameters 
{0}x%cl = {0*}i=i-p, for which x 2 (#i) falls in the x% 
confidence level of the x 2 PDF. Once these sets are found, 
we simply calculate the desired flux for all the sets: the 
maximum and minimum values arc kept for each energy 
bin, defining confidence envelopes for this flux. Thus, the 
main task is to construct confidence intervals for the x 2 
distribution. 

For n parameters in the large sample limit — where the 
joint PDF for the estimator of the parameters and the like- 
lihood function become Gaussian — , the CI is given by 

[Xmin, Xmin + A X 2 ], where A X 2 = Q 7 (l - 7, Tl) 

is the quantile of order 1 — 7 (confidence level CL) of the 
\ 2 distribution (Cowan 1997). However, by applying the 
MCMC, we have access to a direct sampling of the X 2 dis- 
tribution. Hence, independently of the statistical meaning 
of a model, the confidence interval is extracted from the 
cumulative x 2 PDF, by requiring that 

rxL„+&x 2 

/ V(x 2 )d X 2 = 1-7- (A.2) 

We nevertheless checked that both approaches provide very 
similar results. For instance, the CIs (for Model III-C) ob- 
tained directly from Fig. 7 are CI (68%) = [xl, in , Xmi n + 4 - 9 ] 
and CI (95%) = [Xmin>Xmin + 9 - 2 L whereas they are 
CI (68%) = [x 2 „ i „,X 2 nin + 4.7] and CI (95%) = [xl in ,xLn + 
9.5] when calculated from the Q 7 (l — 7, n) quantilcs (Cowan 
1997). 

Appendix B: Data 

In the paper, we focus on the B/C ratio, which is the most 
accurate tracer of the propagation parameters (other trac- 
ers, such as the sub-Fe/Fe or the quartet 1 H, 2 H, 3 He and 
4 He are not considered). We also estimate the potential 
of the p, a secondary species, as an alternative species for 
constraining these parameters. We describe below the typi- 
cal configurations used to calculate the corresponding spec- 
trum as well as the associated datasets used. 

B. l. B/C 

The default configuration for nuclei is the following: the 
value of the observed propagated slope 7 = a + 5, unless 
stated otherwise, is set to be 2.65 (Ave et al. 2008), and 
source abundances of the most abundant species (C, N, 
O, F, Ne, Na, Mg, Al, Si) are rescaled to match HEAO-3 
data at 10.6 GeV/n. Boron is assumed to be a pure sec- 
ondary species. Only elements lighter than Si are propa- 
gated, since they are the only relevant ones for determining 
B/C (Maurin et al. 2001). 

For B/C at intermediate GeV energies, we use HEAO- 
3 data (Engclmann et al. 1990). They are complemented 
at low energy by the ACE (CMS) data (de Nolfo et al. 



2006). For a few model iterations, we also look for combined 
constraints from multiple sets of data. A collection of low- 
energy data is formed by data sets for the IMP7-8 (Garcia- 
Munoz et al. 1987), ISEE-3 (Krombel & Wiedenbeck 1988), 
Ulysses (Duvernois et al. 1996), and Voyager 1&2 (Lukasiak 
et al. 1999) spacecrafts. At higher energy, we consider sev- 
eral balloon flights (Lezniak & Webber 1978; Orth et al. 
1978; Simon et al. 1980; Dwyer & Meyer 1987), the ATIC- 
2 balloon-borne experiment (Panov et al. 2007), and the 
Spacclab-2 experiment (Mueller et al. 1991). For element 
fluxes, HEAO-3 and TRACER results (Ave et al. 2008) are 
used. 

B.2. p 

For the calculation of the p flux, the situation is simpler: the 
production of this secondary flux can be directly linked to 
the accurate measurements of propagated p and He fluxes, 
by the AMS (Alcaraz et al. 2000b,a; AMS Collaboration 
et al. 2000) and BESS experiments (Sanuki et al. 2000; 
Shikaze et al. 2007). For more details of the p flux calcula- 
tion (cross sections, source terms, . . . ), the reader is referred 
to Donato et al. (2001). 

For the p data, we consider the AMS 98 (AMS 
Collaboration et al. 2002) experiment on the shuttle, the 
balloon-borne experiments IMAX 92 (Mitchell et al. 1996), 
CAPRICE 94 (Boezio et al. 1997), WIZARD-MASS 91 
(Basini 1999), CAPRICE 98 (Boezio et al. 2001), and the 
series of BESS balloon flights BESS 95+97 (Orito et al. 
2000), BESS 98 (Maeno et al. 2001), BESS 99 and 2000 
(Asaoka et al. 2002), and BESS 2002 (Haino & et al. 2005). 
We also add the BESS Polar results (BESS Collaboration 
et al. 2008). For BESS data, we use the solar modulation 
level as provided in Shikaze et al. (2007), based on proton 
and helium data. 

Appendix C: Illustration of MCMC chains and PDF 
found with the three trial functions q(6 tv - 1SL \,6i) 

To compare the three trial functions (Sect. 3.3.2), a 
simple setup is retained: the B/C ratio observed from 
HEAO-3 data is used to constrain the model parameters 
{#,}, 1 :■■ = {A , Ro, 6}, i.e. Model I. 

Taking advantage of parallel processing (as underlined 
in Sect. 4.1), we combine several chains of 10 000 steps for 
each trial function. We start with the Gaussian trial func- 
tion. It combines N c = 40 chains and its output is used 
to calculate the covariance matrix. Taking advantage of 
smaller burn-in and correlation lengths, only 20 chains need 
to be combined for the covariance trial function. Again, the 
resulting PDFs are used as input to the BSP trial function, 
which also combines 20 chains. Several chains (shown here 
for S), along with their burn- in length and correlation step 
I, are shown in Fig. C.l. 

The result of the three sampling methods (Gaussian, 
covariance matrix, and BSP) for the PDF are shown in 
Fig. C.2. The insert in each panel provides mean values 
(over the number of chains processed N c ) for relevant pa- 
rameters of the chain analysis (Sect. 3.2): a decrease in 
the burn-in length b (421.5, 20.4 and 2.6) and the correla- 
tion length I (159.7, 6 and 1) is found when moving from 
the Gaussian to the BSP sampling method. The fraction 
of independent samples [as defined in Eqs. (9) and (10)] 
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Fig. C.2. Posterior PDF of {6^} a =i....,3 = {Ao, Ro,5} using the 3 proposal densities defined in Sect. 3.3.2 (Gaussian — top left; 
Covariance matrix — top right; BSP — bottom right). In these three panels, the diagonal shows the ID marginalised posterior density 
function of the indicated parameter. The number of entries in the normalised histograms corresponds to the number of uncorrelated 
steps spent by the chain for values of Ao, Ro and 5. Off-diagonal plots show the 2D marginalised posterior density functions for the 
parameters in the same column and same line respectively: Ao — Ro, Ao — 5 and Rq — S. The colour code corresponds to the regions 
of increasing probability (from paler to darker shade). The two contours (smoothed) delimit regions containing respectively 68% 
and 95% (inner and outer contour) of the PDF. Finally, the bottom-left panel shows an example of a 3D parameter mesh (3 slices 
along the three different planes Ao — S — Ro) obtained from the BSP method. 



is /ind = 0.7% for the Gaussian step, while nearly every 
step is valid and uncorrelated (total of 99.9%) for the BSP 
mode. This confirms that, for a given number of steps, re- 
fined trial functions are more efficient in extracting the PDF 
(note however that some improvement comes from the fact 
that each method takes advantage of the previous step of 
calculation). 

Figure C.2 (bottom- left) illustrates the binary-space 
partitioning discussed in Sect. 3.3.2. It shows the projec- 
tions of box sides on the three 2D planes Ao — Rq, Xq — S 
and Rq — 8 of the parameter space. The partitioning has 
been produced by the BSP method using the covariance 
matrix procedure presented on the same figure (top-right), 



where the box density is clearly proportional to the esti- 
mated target density. 
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